Unsupervised training and directed manual transcription for LVCSR

نویسندگان

  • Kai Yu
  • Mark J. F. Gales
  • Lan Wang
  • Philip C. Woodland
چکیده

A significant cost in obtaining acoustic training data is the generation of accurate transcriptions. When no transcription is available, unsupervised training techniques must be used. Furthermore, the use of discriminative training has become a standard feature of state-ofthe-art large vocabulary continuous speech recognition (LVCSR) system. In unsupervised training, unlabelled data are recognised using a seed model and the hypotheses from the recognition system are used as transcriptions for training. In contrast to maximum likelihood training, the performance of discriminative training is more sensitive to the quality of the transcriptions. One approach to deal with this issue is data selection, where only well recognised data are selected for training. More effectively, as the key contribution of this work, an active learning technique, directed manual transcription, can be used. Here a relatively small amount of poorly recognised data is manually transcribed to supplement the automatic transcriptions. Experiments show that using the data selection approach for discriminative training yields disappointing performance improvement on the data which is mismatched to the training data type of the seed model. However, using the directed manual transcription approach can yield significant improvements in recognition accuracy on all types of data. 2010 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Training with Directed Manual Transcription for Recognising Mandarin

The performance of unsupervised discriminative training has been found to be highly dependent on the accuracy of the initial automatic transcription. This paper examines a strategy where a relatively small amount of poorly recognised data are manually transcribed to supplement the automatically transcribed data. Experiments were carried out on a Mandarin broadcast transcription task using both ...

متن کامل

Unsupervised training with directed manual transcription for recognising Mandarin broadcast audio

The performance of unsupervised discriminative training has been found to be highly dependent on the accuracy of the initial automatic transcription. This paper examines a strategy where a relatively small amount of poorly recognised data are manually transcribed to supplement the automatically transcribed data. Experiments were carried out on a Mandarin broadcast transcription task using both ...

متن کامل

Towards automatic learning in LVCSR: rapid development of a Persian broadcast transcription system

We present a new method for automatic learning and refining of pronunciations for large vocabulary continuous speech recognition which starts from a small amount of transcribed data and uses automatic transcription techniques for additional untranscribed speech data. The recognition performance of speech recognition systems usually depends on the available amount and quality of the transcribed ...

متن کامل

PodCastle: Collaborative Training of Language Models on the Basis of Wisdom of Crowds

This paper presents a language-model training method for improving automatic transcription of online spoken contents. Unlike previously studied LVCSR tasks such as broadcast news and lectures, large-sized task-specific corpora for training language models cannot be prepared and used in recognition because of the diversity of topics, vocabularies, and speaking styles. To overcome difficulties in...

متن کامل

A study of irrelevant variability normalization based training and unsupervised online adaptation for LVCSR

This paper presents an experimental study of a maximum likelihood (ML) approach to irrelevant variability normalization (IVN) based training and unsupervised online adaptation for large vocabulary continuous speech recognition. A movingwindow based frame labeling method is used for acoustic sniffing. The IVN-based approach achieves a 10% relative word error rate reduction over an ML-trained bas...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Speech Communication

دوره 52  شماره 

صفحات  -

تاریخ انتشار 2010